An Assessment of Language Elicitation without the Supervision of a Linguist

نویسندگان

  • Alison Alvarez
  • Lori Levin
  • Robert Frederking
  • Jill Lehman
چکیده

The AVENUE machine translation system is designed for resource poor scenarios in which parallel corpora are not available. In this situation, parallel corpora are created by bilingual consultants who translate an elicitation corpus into their languages. This paper is concerned with evaluation of the elicitation corpus: is it suitably designed so that a bilingual consultant can produce reliable data without the supervision of a linguist? We evaluated two translations of the elicitation corpus, one into Thai and one into Bengali. Two types of evaluation were conducted: an error analysis of the translations produced by the Thai and Bengali consultants, and a comparison of Example Based MT trained on the original human translations and on corrected translations. AVENUE Elicitation Tool (Language pair shown is Spanish/Mapudungun.) Linguistic Resource: REFLEX As part of a U.S. government project called REFLEX, we produced an elicitation corpus of 3124 English sentences, which the Linguistic Data Consortium (LDC) is translating into a number of languages, beginning with Thai and Bengali. Contrary to the AVENUE scenario, no hand alignments were done, and there was no supervision of the translators by the AVENUE team. Elicitation Corpus: example sentences • Mary is writing a book for John. • Who let him eat the sandwich? • Who had the machine crush the car? • They did not make the policeman run. • Our brothers did not destroy files. • He said that there is not a manual. • The teacher who wrote a textbook left. • The policeman chased the man who was a thief. • Mary began to work. Elicitation Corpus: detailed example srcsent: We baked cookies. context: We = 5 men; ((actor ((np-function fn-actor) (np-general-type pronoun-type)(np-person person-first) (np-identifiability identifiable) (np-pronoun-exclusivity inclusivity-neutral) np-number num-pl) (np-biological-gender bio-gender-male)(np-animacy anim-human) (np-specificity specific)(np-pronoun-antecedent antecedent-not-specified) (np-distance distance-neutral))) (undergoer ((np-function fn-undergoer)(np-person person-third)(np-identifiability unidentifiable) (np-number num-pl)(np-specificity non-specific)(np-animacy anim-inanimate) (np-biological-gender bio-gender-n/a)(np-general-type common-noun-type) (np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a) (np-distance distance-neutral))) (c-polarity polarity-positive) (c-v-absolute-tense past) (c-v-lexical-aspect activityaccomplishment)(c-general-type declarative-clause)(c-my-causer-intentionality intentionalityn/a)(c-comparison-type comparison-n/a)... Figure 1: An abridged feature structure, a source language sentence and its context field Minimal Pairs: Change vs. No Change Figure 2: Context information isn’t always incorporated into target language translations. The two sentences translated into Modern Standard Arabic (2a and 2b) are translated differently based on the number of people ‘You’ represents. However, the Spanish translations remain the same in 2c and 2d. This example and further ones can be found in our translator guide. a. Sentence: You wrote. Context: You = five men Translation: antum katabtum b. Sentence: You wrote. Context: You = two men Translation: antumaa katabtumaa c. Sentence: You wrote. Context: You = five men Translation: escribieron d. Sentence: You wrote. Context: You = two men Translation: escribieron Elicitation Error Analysis: statistics . Thai Elicitation Errors Source Sentence Over-Translation 845 79.41% Context OverTranslation 57 5.35% Under-translation 88 8.48% Mistranslation 68 6.39% Grammar Mistakes 6 0.19% Total 1064 100% Bengali Elicitation Errors Source Sentence Over-Translation 0 0.0% Context OverTranslation 24 6.68% Under-translation 5 1.39% Mistranslation 76 21.17% Grammar and Spelling Mistakes 254 70.75%

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Self-Supervision and the Self-Efficacy of Iranian EFL Teachers and Those of Intermediate Adult Learners

The present study was conducted to examine the relationship between the self-supervision and the self-efficacy of Iranian EFL teachers and also the relationship between the self-supervision and the self-efficacy of intermediate adult learners individually. To this end, 40 EFL teachers and 55 intermediate adult learners were selected from two branches of Kish Language Institute. In this study, “...

متن کامل

A Comparison between Three Methods of Language Sampling: Freeplay, Narrative Speech and Conversation

Objectives: The spontaneous language sample analysis is an important part of the language assessment protocol. Language samples give us useful information about how children use language in the natural situations of daily life. The purpose of this study was to compare Conversation, Freeplay, and narrative speech in aspects of Mean Length of Utterance (MLU), Type-token ratio (TTR), and the numbe...

متن کامل

Editorial Volume 6, Issue 1

The editor’s notes in our Journal have been so far a site for the clarification of the Journal’s policy and the task still continues. With an inclination towards solving our real world problems in language teaching (and literary studies, which I will discuss in the next issue of the Journal), we would like to take that the introduction of the concept of “life-world” to Social Sciences can be a ...

متن کامل

A Review of Internet-Centered Language Assessment: Origins, Challenges, and Perspectives

This article defines the origin of an internet-centered language assessment (ICLA), how ICLAs are different from the other traditional computer-oriented tests, and what uses and functions ICLAs have in different taxonomies of language testing. After a very short review of computer- oriented testing, ICLAs are defined and categorized in low-tech or high tech categories. Since low-tech tests are ...

متن کامل

An Investigation of the Linguistic, Paralinguistic and Sociocultural Effects of Input on the Perception and Translation of Gerunds by Persian Speakers of English

In this study, it was intended to investigate the Persian native speakers’ perception of gerunds by three different elicitation techniques i.e., written, audio, and pictorial through translation. Eighty intermediate learners of English were asked to select Persian translation of the gerund formsin these elicitation techniques. They were asked to choose one option from a pair of written first la...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007